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1. INTRODUCTION 

Keywords are a sequence of words or phrases that represent the content of a document [1, 2]. 
Keywords are included primarily to help readers quickly understand the main idea of the document. 
Furthermore, keywords can also be utilized to support content analysis of a document, such as content 
matching [3], document categorization [4], document browsing [5], and document summarization [6]. 
However, coming up with keywords manually needs considerable time and effort [2, 7]. Moreover, 
subjectivity is another problem that may arise in manual keywords generation due to the different 
backgrounds, knowledge, experience, etc. of the authors. Automatic keyword extraction techniques provide 
a means to address this subjectivity limitation of manual keyword creation [5, 8]. 

Keywords can be extracted from both a single document or a collection of documents. Extracting 
keywords from a single document, as the focus of this paper, aims to obtain the essential representation 
of that document [9]. On the other hand, extracting keywords from a collection of documents is usually 
intended to compare and classify each document into some categories. Previous research proposed various 
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methods to extract keywords automatically in a single document. These approaches can be categorized into 

5 similar approaches: statistic approach, linguistic approach, graph-based approach, machine learning 

approach, and the combination of these approaches [10, 11]. 

Jonathan and Karnalim represented automatic keyword extraction as a classification problem. 
They utilized statistical features of candidate keywords, such as term frequency (TF), inverse document 
frequency (IDF), and the combination of both (TF-IDF) [12]. TF-IDF also used as a feature for the classifier 
by Zhang et al. [13], among other word features. Among all of these features from previous research, TF is 
the only feature applicable to single document keywords generation because IDF needs a collection of 
documents. The result from Jonathan and Karnalim [12] shows that using TF gives a decent performance, 
but not as good as using IDF. We considered TF as a scenario in our study, but we argue that using a 
classification approach with machine learning in keyword extraction could be troublesome due to 
the possible imbalance set between keyword and non-keyword candidates. 

Another method ranks candidate keywords to determine keywords from a single document based on 
the statistical features of the document content. For example, Imran et al. [14] proposed a linguistic-based 
model called visualness with lesk disambiguation (VLD) to gather candidate keywords followed by word 
sense disambiguation and visual similarity to determine the important keywords. Combining statistical and 
linguistics approaches, the top candidate of keywords are acquired by measuring how well the candidates 
describe the document title. Unfortunately, this technique only able to generate single-word keywords, 
whereas, in reality, it is common for a keyword to have more than one word [15]. 

Another approach called TextRank is proposed by Mihalcea and Tarau [16]. This method 
transformed the document text into a graph in which the nodes are the words and the edges are 
the co-occurrence between corresponding words in several sizes of windows. TextRank is then modified into 
SingleRank and ExpandRank [17] which could give an improved result over TextRank but were still 
outperformed by the TF-IDF approach [18]. TextRank is then further improved by Li and Wang [19] with 
known keywords as domain knowledge to generate the phrase network. The addition of domain knowledge 
provides a better result than the original TextRank, but domain knowledge of document is not always 
available. On a similar approach using graph-based, Lahiri et al. [20] examined a number of centrality 
measures such as degree, weighted degree, and PageRank to determine the rank of keyword candidates. 
Their result showed that even though there was an improvement, the use of centrality measures could not 
outperform the use of TF and TF-IDF. 

A different approach is proposed by Bhowmik [21] which generates candidate keywords from 
the title and the abstract of a document using the Perceptron Training Rule. Bhowmik also used the location 
of keyword candidates in a paragraph as the weight for selecting those candidate keywords. The result 
showed that using only the document’s abstract is not sufficient for keywords generation. Lu et al. [22] 
proposed another approach by utilizing the references of the document as the keywords’ source using the 
LocalMax algorithm. Both Bowmik [21] and Lu et al. [22] provided performance evaluations by comparing 
the generated keywords with the actual keywords from the document dataset. We adopt this approach to 
evaluate the performance of our keywords generator. 

Different from other previous studies, Shukla and Kakkar proposed an approach called Noun-Phrase 
Chunking [23]. This method only considers noun phrases, which are manually specified, to extract phrases 
from the text document. This chunking method only extracts candidate keywords with the longest phrase. 
All extracted candidate keywords were assigned as keywords, without any selections or evaluations. 

The phrase chunking method is straightforward because it uses part of speech (POS) patterns to 
extract the keyword’s phrases. This method is also flexible in terms of providing control to determine which 
phrase patterns used to extract the keywords. The previous phrase chunking method proposed by Shukla 
and Kakkar [23] used several self-defined patterns. We argue that using self-defined patterns may lead to 
the limited coverage of phrase pattern possibilities, i.e., some keyword patterns may appropriate but not 
selected. In this paper, we proposed to address the limitation of this phrase chunking method by 
automatically generate patterns using a collection of documents. Therefore, the contributions of this research 
are as follows: 

a. We proposed approaches for generating phrase patterns from a collection of documents, as an alternative 
to self-defined patterns proposed by Shukla and Kakkar [23], 

b. We attempted to reduce the number of keywords by conducting a candidate reduction as a part of our 
experiment, 

c. We conducted experiments using several keywords extraction scenarios, such as different sources of 
keywords, candidate keywords selections, and phrase patterns variations, and provide performance 
comparisons between those different scenarios. 

Our experiment include a replication of the method proposed by Shukla and Kakkar [23], but for 
Bahasa Indonesia, with several adjustments: 1) we generated phrase patterns from the documents instead of 
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using self-defined patterns, 2) we reduced the number of yielded keywords from candidate instead of using 
all candidates as keywords. Compared to their method with the aforementioned adjustments, our result 
showed an improvement of performance based on the f-measure of the extracted keywords. 


2. RESEARCH METHOD 

We proposed an automatic keyword extraction technique using phrase chunking. Figure 1 shows 
the methodology of this research. First, we collected and prepared a collection of documents as our dataset. 
Then, we labeled each word in every document in the dataset with its corresponding part of speech (POS) 
automatically by utilizing an automated tagger. Afterward, we developed phrase patterns based on the part 
of speech of the keywords in the dataset. Then, we employed phrase chunking, i.e., extracting candidate 
keywords from the documents using the phrase patterns. Lastly, we conducted experiments in keywords 
selection using a number of scenarios and evaluate the results. The remainders of this section explain each 
step in more detail. 








Candidate 
Document i Phrase Pattern 
Preparation POS Tagging Development 3 Sowo 

















Keyword 


Evaluation /<«— Experiment \ Selection 























Figure 1. Methodology 


2.1. Document collection and preparation 

In this research, we used E-Jurnal Akuntansi from Universitas Udayana [24] as our dataset. 
This journal is an open-access journal in the accounting field written in Bahasa Indonesia. We randomly 
selected 150 publications from volume 14-2016 to volume 18-2017 of the journal as our dataset. 
We divided each document into three different parts, namely abstract, keywords, and content text (or 
the main text of the document). We used these different parts separately in this research. The abstract and 
content text is used as the source for the keyword extraction, while the keywords are used to generate 
the phrase patterns. We omitted the title, section headers, and references. After collecting the documents, 
several pre-processing techniques, namely non-ASCII characters removal, punctuation spacing, newlines 
removal, and excessive white spaces removal, were conducted. These pre-processing were performed to each 
part (i.e., abstract, keywords, and content text) of the dataset. 


2.2. POS tagging 

After document preparation, we labeled each word of each document in the dataset with its 
corresponding part of speech, i.e., POS tagging. The labeling process was conducted using the Indonesian 
POS Tagger tool and the tagset provided by Wicaksono and Purwarianti [25]. In the abstract and the content 
text, POS tagging was applied directly to the text. Labeling part of speech to the keywords is not trivial since 
keywords are unlike the abstract or the content text, which consists of sentences. Thus, POS tagging 
keywords was conducted by utilizing the POS sequence of the keywords found in both the abstract 
and the content text. For example, to label the keyword “rasio keuangan”, we used the labeled abstract 
and content text and search for “rasio keuangan” as a phrase in the text, not just a single word of “rasio” or 
“keuangan”. If the phrase is found in the text, we labeled the keywords phrase with the POS tags of that 
phrase found in the text. In the case of more than one POS sequence found, because a keyword phrase may 
occur more than once in the text, the keyword will be labeled with the most frequently occurred POS 
sequence. We used the POS tagged keywords set in the next step, phrase pattern development, to extract 
keywords from the tagged abstract and content text. 


2.3. Phrase pattern development 

Phrase pattern is defined as the POS tag sequence that forms a keyword phrase. Phrase patterns are 
used to obtain the candidate keywords from the text. For example, for a phrase pattern "NN JJ” (a noun 
followed by an adjective), we extracted all phrases in the text having the ”NN JJ” tag sequence as candidate 
keywords. In the previous study, Shukla and Kakkar manually defined the phrase patterns [23]. In this 
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research, we used an automated approach by collecting the phrase patterns from all tagged keywords in 
the dataset, similarly resembling the knowledge domain from prior research [19]. To select the criteria for 
the experiment, we also collected the patterns’ occurrences in the dataset as the patterns’ weight. Table 1 
showed examples of phrase patterns found in the dataset. For example, in Table 1, phrase pattern ”NN NN” 
(or a noun followed by another noun) occurred 235 times in the dataset, which then becomes the weight of 
that pattern (i.e., phrase pattern weight). 


Table 1. Some examples of extracted phrase patterns and their weight 








Phrase Pattern Phrase Pattern Weight 
NN NN 235 
NN NN NN 44 
NN JJ 15 





2.4. Phrase chunking 

After having a list of phrase patterns from the dataset, we extracted candidate keywords from 
the abstract and the content text using phrase chunking. We collected all phrases from the POS tagged text in 
the dataset (tagged abstracts and content texts), having the same phrase pattern from our phrase patterns list. 
Figure 2 shows an example of a phrase chunking process. Using “NN NN” as the phrase pattern, we extracted 
“kinerja/NN perusahaan/NN”, “kondisi/NN keuangan/NN”, etc. as the candidate keywords from the tagged 
content text. 

During the process, we found phrases with overlapping words. For example, “kondisi keuangan” 
and “keuangan perusahaan” are overlapped with “kondisi keuangan perusahaan”. We kept all those phrases 
for candidate selection in the next step. This approach is different from the approach proposed by Shukla and 
Kakkar [23] because they only consider the longest overlapping words as the candidate. Using this example, 
Shukla and Kakkar’s approach will only consider “kondisi keuangan perusahaan” but not “kondisi 
keuangan” and “keuangan perusahaan” as the candidates. We argue that our approach will be beneficial 
since a shorter phrase still can become a keyword. Table 2 shows several examples of candidate keywords 
extracted from the dataset. We recorded the occurrences of each candidate in the dataset as 
the candidate weight along with the phrase pattern weight found in the previous step (section 2.3.). We used 
these weights in the keywords selection process. 




















Tagged Text Phrase Pattern 
Kinerja/NN perusahaan/NN menggambarkan/VBI NN NN 
mengenai/VBT kondisi/NN keuangan/NN perusahaan/NN NN NN NN 
yang/SC tercermin/VBT melalui/VBT prestasi/NN kerja/NN = NNJJ 
dalam/IN suatu/DT periode/NN yang/SC dianalisis/NN 





dengan/IN rasio/NN keuangan/NN ./. Pengukuran/NNP 
kinerja/NN perusahaan/NN dilakukan/VBT untuk/IN 
melihat/VBT apakah/WP hasil/NN yang/SC dicapai/VBT 
perusahaan/NN sudah/MD sesuai/RB dengan/IN 
perencanaan/NN ./. s.s.s... 











Extracted Candidate 
kinerja perusahaan, kondisi keuangan, keuangan perusahaan, prestasi kerja, rasio keuangan, 











kondisi keuangan perusahaan, ............... 





Figure 2. Illustration of the phrase chunking process 


Table 2. Example of extracted candidate keywords 








Candidate Candidate Weight Phrase Pattern Phrase Pattern Weight 
kinerja perusahaan 2 NN NN 235 
kondisi keuangan 1 NN NN 235 
keuangan perusahaan 1 NN NN 235 
prestasi kinerja 1 NN NN 235 
rasio keuangan 1 NN NN 235 
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2.5. Keyword extraction 

Before conducting keywords selection, we removed keywords from the list of candidate keywords that 
occurred only once in the source text. We argue keywords that occurred only once in a document are not 
sufficient to represent the main idea of that document. Afterward, we sorted the remaining candidates by their 
candidate weight and phrase pattern weight. In keywords selection, we sorted the candidate keywords using 
several scenarios of multiplication between the candidate weight and phrase pattern weight of each candidate 
keyword (as the sorting base). The top five candidates based on the sorting base are taken as the keywords. 
In another case, if other candidates have the same sorting base value as in the fifth position of sorted keywords, 
those candidates are also taken as keywords. For example, if the sixth and seventh position candidates have 
the same sorting base value as the fifth position keywords, both candidates are taken as keywords as well. 
Another case in which there are several candidates with the same sorting base value, the sorting is conducted 
based on the length of the phrase: the longer phrase sorted higher than the shorter one. For example, if both 
candidate keywords “kinerja keuangan perusahaan” and “kinerja keuangan” have the same sorting base value, 
the keyword “kinerja keuangan perusahaan” will be positioned higher than the keyword “kinerja keuangan”. 


2.6. Experiment 

In this section, we explained the scenarios that we used in the keyword extraction experiments. 
Our dataset consists of abstracts and content texts that have been labeled with the part of speech. The dataset 
was then divided into training data and testing data. The training data was used as the source to develop 
phrase patterns that were used to extract keywords in the testing data. We then compared the keywords 
generated using several keyword selection scenarios with the actual keywords from the documents in our 
dataset and evaluate each scenario using precision, recall, and f-measure. From 150 documents in our dataset, 
we use 5-fold cross-validation. Thus, we have 120 documents as the training data and 30 documents as 
the testing data for each fold. 

In the experiment, we evaluated several scenarios of our method, which are based on 4 parameters: 

- (P1) Source of the keywords extraction 

In this parameter, we specified the source for the candidate keywords extraction. The values for this 
parameter are either the abstracts or the content texts from the dataset as the source. 
- (P2) Phrase patterns 

This parameter specifies the phrase patterns used in the experiments. The values defined for this 
parameter are short phrase patterns (i.e. phrases that consist of only 2 or 3 words) or all phrase patterns. 
The objective of using this parameter was to determine whether a shorter phrase pattern gives better 
performance over all phrase patterns, as used in the previous study [23], in generating keywords. 

- (P3) Candidate keywords reduction 

Candidate keywords reduction is the removal of some keywords from the candidate keywords list. 
A keyword is removed if there is a sub-string or super-string of the keyword that has a higher-order based on 
the sorting base value (see parameter P4). For example, a keyword “rasio keuangan” will be removed from 
the candidate keyword list if “rasio” has a higher weight, as a sorting base value, in the candidate keywords list. 
In this parameter, we specified whether we utilize candidate reduction or not in the experiments. We were 
inspired to use candidate reduction from prior research [23], such that there are no overlapping keywords. 

- (P4) Candidate keywords sorting base 

In this parameter, we specified sorting scenarios to select the keywords from the list of candidate 
keywords. We used candidate keywords’ weights and phrase patterns’ weight as the sorting base. The sorting 
scenarios of the experiments are as follows: 

A: Descending sort based only on the candidate keywords weight, without considering the phrase 
pattern weight. 

: Descending sort based on the multiplication of the candidate weight and the phrase pattern weight. 

: Descending sort based on multiplication of the candidate weight and the sub-linear TF normalized phrase 
pattern weight. 

: Descending sort based on the multiplication of the candidate weight and the maximum TF normalized 
phrase pattern weight. 

In the last two scenarios, the normalization is carried out to address the imbalance between each phrase 

pattern weight, as exemplified by Table 2. 

All of the parameters are combined as the experiment scenarios. The top 5 of phrase patterns in each 
fold are shown in Table 3, meanwhile all 32 scenarios and their parameters in the experiment are shown in 
Table 4. Each scenario is evaluated by comparing the keywords generated with the actual keywords from 
the dataset and evaluate them using precision, recall, and f-measure. These metrics are commonly used to 
measure the performance of automatic keyword extraction [26]. 


7 aw 
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3. RESULTS AND ANALYSIS 

In the experiment, the phrase patterns were extracted from each fold of the training set. Table 3 shows 
the top five phrase patterns found in the training set as examples. From the table, one can see that 
“NN NN” occurred significantly higher (denoted by the pattern’s weight) than the other patterns. To minimize 
the differences, a normalization technique is employed as the sorting scenario’s parameters of the experiment. 
The experiment result for all scenarios is shown in Table 4. From this result, scenario 13 gives the best 
f-measure. That means the best result was obtained if we used the abstract as the source of the keyword 
extraction, used only short phrase patterns with candidate reduction, and sorted the candidate keywords based 
on their weights. In the next subsection, we discussed the performance of each parameter. 


Table 3. Top 5 phrase pattern found in each fold 








Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 
Pattern Weight Pattern Weight Pattern Weight Pattern Weight Pattern Weight 
NN NN 235 NN NN 241 NN NN 241 NN NN 254 NN NN 249 

NN 89 NN 82 NN 82 NN 69 NN 84 
NN NN 44 NN NN 43 NN NN 43 NN NN 38 NN NN 39 
NN NN NN NN NN 
NN JJ 15 NN JJ 21 NN JJ 21 NN JJ 16 NN JJ 16 
NN NN 9 NN VBI 8 NN VBI 8 NN NN 9 NN NN 7 
NN NN NN NN NN NN NN NN 





Table 4. Result of all extraction scenarios 








Scenario Parameter Result 
Pl P2 P3 P4 Precision Recall F-measure 

1 Abstract All Pattern No A 0.16 0.397 0.218 
2 Abstract All Pattern No B 0.314 0.483 0.372 
3 Abstract All Pattern No C 0.199 0.324 0.241 
4 Abstract All Pattern No D 0.221 0.354 0.267 
5 Abstract All Pattern Yes A 0.134 0.266 0.174 
6 Abstract All Pattern Yes B 0.317 0.489 0.375 
7 Abstract All Pattern Yes C 0.153 0.239 0.184 
8 Abstract All Pattern Yes D 0.184 0.282 0.219 
9 Abstract Short Pattern No A 0.368 0.58 0.424 
10 Abstract Short Pattern No B 0.411 0.528 0.446 
11 Abstract Short Pattern No Cc 0.403 0.53 0.44 
12 Abstract Short Pattern No D 0.398 0.52 0.434 
13 Abstract Short Pattern Yes A 0.458 0.584 0.492 
14 Abstract Short Pattern Yes B 0.416 0.526 0.447 
15 Abstract Short Pattern Yes C 0.415 0.526 0.447 
16 Abstract Short Pattern Yes D 0.414 0.527 0.446 
17 Content Text All Pattern No A 0.11 0.146 0.124 
18 Content Text All Pattern No B 0.285 0.371 0.318 
19 Content Text All Pattern No C 0.14 0.184 0.157 
20 Content Text All Pattern No D 0.179 0.234 0.2 
21 Content Text All Pattern Yes A 0.038 0.046 0.041 
22 Content Text All Pattern Yes B 0.292 0.374 0.324 
23 Content Text All Pattern Yes C 0.066 0.081 0.071 
24 Content Text All Pattern Yes D 0.111 0.142 0.123 
25 Content Text Short Pattern No A 0.359 0.484 0.407 
26 Content Text Short Pattern No B 0.341 0.443 0.38 
27 Content Text Short Pattern No C 0.37 0.477 0.412 
28 Content Text Short Pattern No D 0.372 0.479 0.414 
29 Content Text Short Pattern Yes A 0.36 0.473 0.403 
30 Content Text Short Pattern Yes B 0.344 0.446 0.382 
31 Content Text Short Pattern Yes C 0.366 0.468 0.405 
32 Content Text Short Pattern Yes D 0.37 0.475 0.411 





3.1. Source of keyword extraction 

Figure 3 compares the experiment using the abstract and the content text as the source of 
the keywords extraction. This figure shows that using abstract as the source gave a better result than using 
the content text. It means, although not substantial, extracting keywords from shorter and brief text like 
abstract is better than long text like the content text in our dataset. 
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Figure 3. Comparison of the scenarios with abstract and content text as the reference 


24 25 26 27 28 29 30 31 


Scenario 
OAbstract W Content text 


3.2. Phrase patterns 

Figure 4 shows the comparison between using all phrase patterns and using only short phrase 
patterns. From the figure, we could see that keywords extraction using short phrase patterns gave a better 
result compared to using all phrase patterns. Table 4 also shows that short phrase patterns always gave better 
precision and recall. We observed that this is because using all phrase patterns include using single word 
candidates. A single word candidate occurred more than a phrase candidate such that it has a higher rank than 
phrase candidates if we used the number of occurrences as the base for candidate sorting. This higher rank 
causes single word candidates to be selected as keywords, even if they are not suitable keywords. We argued 
that this can be avoided if the candidate sorting is based on not only the occurrences of the patterns. 
The result proves the utilization of short patterns is better than using all patterns used in previous work [23]. 
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Figure 4. Comparison of the scenarios based on phrase pattern used 


3.3. Candidate keyword sorting 

Figure 5 shows the experiment result using different candidate keywords sortings. The figure shows 
that the sorting scenario B, i.e., descending sort based on the multiplication of the candidate weight 
and the phrase pattern weight, has the highest f-measure score. In Figure 5, there are 5 group columns 
of comparison: the first, second, third, fifth, and sixth columns, which have Sorting B as the best method 
of sorting. The first, second, fifth, and sixth group columns are extraction with all phrase patterns. As we 
stated before, if all phrase patterns were used, then the best sorting method is using value based on 
multiplication of candidate weight and phrase pattern weight. For other comparison groups (i.e., fourth, 
seventh, and eighth column), the best results are varied, but there were no substantial differences between 
them. From this experiment, normalization reduces the significant differences of phrase patterns weights 
of each candidate keyword as shown in Table 2. However, the normalization of the phrase pattern weights 
(i.e., scenario C and D) did not give substantially better results. 
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Figure 5. Comparison of the scenarios based on the sorting 


3.4. Candidate keyword reduction 

Candidate reduction in our proposed method did not always give a better result. Figure 6 shows only 
7 out of 16 scenario comparisons where using candidate reduction provides a better result, namely, scenario 2, 
9, 10, 11, 12, 18, and 26. From Table 4, we can see that scenario 2, 10, 18, and 26 are the scenarios which used 
Sorting B as the sorting method, while scenario 9, 10, 11, and 12 are scenarios that used only the shorter-phrase 
patterns from the abstract. In this case, we could see that candidate reduction gave a better result when 
the candidate sorting was based on the multiplication of candidate weight and phrase pattern weight without 
normalization, or using the shorter-phrase patterns from the abstract. The best result from this experiment was 
obtained using candidate reduction. This result also implicates that the utilization of candidate reduction or 
similarly regular expression like previous research [23] does not always improve keywords extraction results. 
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Figure 6. Comparison of the scenarios without and with the reduction 


4. CONCLUSION 

This research focused on automatic keywords extraction using phrase chunking for a single 
document written in Bahasa Indonesia. We used phrase patterns from actual keywords in our dataset for 
the phrase chunking. Then, we experimented using several scenarios defined from several parameters to 
generate the keywords. The best result was gained when the extraction was conducted using the abstract with 
shorter-phrase patterns (i.e., consists of only 2-3 words), candidate sorting based on the occurrence frequency 
of each candidate, and applying candidate reduction. 

From this research, we concluded that using the abstract gave a better result of keyword extraction 
than content text. Our result also shows that the extraction using shorter-phrase patterns gave a better result 
than using all phrase patterns. This result is aligned with the fact that most of the keywords in papers written 
in Bahasa Indonesia are in phrases form. Candidate reduction in this research did not always give better 
results in all scenarios, except when the sorting method was based on candidate weight and phrase pattern 
weight without any normalization or if shorter-phrase patterns were used. Our adjustments in using generated 
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patterns and reducing the number of selected keywords also showed substantially better results than prior 
research, which used all phrase patterns and candidate reduction. Future research of keywords extraction with 
phrase chunking can use other selection methods for candidate selection combined with other methods of 
candidate weighting. Another experiment can also be carried out by combining the abstract and the content 
text as the source of keyword extraction algorithms. 
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